Preprocessing Methods for Word Alignment
نویسنده
چکیده
This paper compares four preprocessing approaches for word alignment: 1) sentence removal approach, 2) good points approach, 3) sentence duplication approach, and 4) removal of doubtful alignments approach. Two are statistically motivated and the other two are heuristics. We focus on the ability of a word aligner of IBM Model 4 that it should often face with troubles when handling paraphrase, multi-words and non-literal translation. We assume that IBM Model 4 works 90% correct, while only around 5% wrong.
منابع مشابه
Semi-supervised Word Alignment with Mechanical Turk
Word alignment is an important preprocessing step for machine translation. The project aims at incorporating manual alignments from Amazon Mechanical Turk (MTurk) to help improve word alignment quality. As a global crowdsourcing service, MTurk can provide flexible and abundant labor force and therefore reduce the cost of obtaining labels. An easyto-use interface is developed to simplify the lab...
متن کاملCombination of Statistical Word Alignments Based on Multiple Preprocessing Schemes
We present an approach to using multiple preprocessing schemes to improve statistical word alignments. We show a relative reduction of alignment error rate of about 38%.
متن کاملIterative reordering and word alignment for statistical MT
Word alignment is necessary for statistical machine translation (SMT), and reordering as a preprocessing step has been shown to improve SMT for many language pairs. In this initial study we investigate if both word alignment and reordering can be improved by iterating these two steps, since they both depend on each other. Overall no consistent improvements were seen on the translation task, but...
متن کاملThe Karlsruhe Institute for Technology Translation System for the ACL-WMT 2010
This paper describes our phrase-based Statistical Machine Translation (SMT) system for the WMT10 Translation Task. We submitted translations for the German to English and English to German translation tasks. Compared to state-of-the-art phrase-based systems we preformed additional preprocessing and used a discriminative word alignment approach. The word reordering was modeled using POS informat...
متن کاملConsensus versus Expertise : A Case Study of Word Alignment with Mechanical Turk
Word alignment is an important preprocessing step for machine translation. The project aims at incorporating manual alignments from Amazon Mechanical Turk (MTurk) to help improve word alignment quality. As a global crowdsourcing service, MTurk can provide flexible and abundant labor force and therefore reduce the cost of obtaining labels. An easyto-use interface is developed to simplify the lab...
متن کامل